{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 多个输入和输出通道\n",
    "\n",
    ":label:`sec_channels`\n",
    "\n",
    "\n",
    "虽然我们已经描述了多个通道\n",
    "包括每个图像的（例如，彩色图像具有标准RGB通道）\n",
    "表示红色、绿色和蓝色的数量），\n",
    "到目前为止，我们简化了所有的数值例子\n",
    "只使用一个输入和一个输出通道。\n",
    "这让我们能够思考我们的输入，卷积核，\n",
    "并以二维数组的形式输出。\n",
    "\n",
    "当我们在混合中加入频道时，\n",
    "我们的输入和隐藏的表示\n",
    "两者都变成了三维阵列。\n",
    "例如，每个RGB输入图像的形状为$3\\times h\\times w$。\n",
    "我们称这个轴为通道尺寸，大小为3。\n",
    "在本节中，我们将更深入地了解\n",
    "具有多个输入和多个输出通道的卷积核。\n",
    "\n",
    "## 多输入通道\n",
    "\n",
    "当输入数据包含多个通道时，\n",
    "我们需要构造一个卷积核\n",
    "使用与输入数据相同数量的输入通道，\n",
    "这样它就可以与输入数据进行互相关。\n",
    "假设输入数据的通道数为$c_i$，\n",
    "卷积内核的输入通道数也需要是$c_i$。如果我们的卷积核的窗口形状是$k_h\\times k_w$，\n",
    "然后当$c_i=1$，时，我们可以考虑我们的卷积核\n",
    "就像形状$k_h\\times k_w$的二维数组。\n",
    "\n",
    "然而，当$c_i>1$时，我们需要一个内核\n",
    "对于每个输入通道，*它包含一个形状为$k_h\\times k_w$*的数组。\n",
    "将这些$c_i$数组连接在一起\n",
    "产生一个形状为$c_i\\times k_h\\times k_w$的卷积核。\n",
    "\n",
    "由于输入和卷积内核都有$c_i$通道，\n",
    "我们可以进行互相关运算\n",
    "关于输入的二维数组\n",
    "以及卷积核的二维核数组\n",
    "对于每个频道，将$c_i$结果相加\n",
    "（通过通道求和）\n",
    "生成二维数组。\n",
    "这是二维互相关的结果\n",
    "在多通道输入数据和\n",
    "一个*多输入通道*卷积内核。\n",
    "\n",
    "在 :numref:`fig_conv_multi_in`，我们举一个例子\n",
    "与两个输入通道的二维互相关。\n",
    "阴影部分是第一个输出元素\n",
    "以及计算中使用的输入和内核数组元素：\n",
    "$(1\\times1+2\\times2+4\\times3+5\\times4)+(0\\times0+1\\times1+3\\times2+4\\times3)=56$.\n",
    "\n",
    "![使用2个输入通道进行互相关计算。阴影部分是第一个输出元素以及计算中使用的输入和内核数组元素： $(1\\times1+2\\times2+4\\times3+5\\times4)+(0\\times0+1\\times1+3\\times2+4\\times3)=56$. ](https://raw.githubusercontent.com/d2l-ai/d2l-en/master/img/conv-multi-in.svg)\n",
    "\n",
    ":label:`fig_conv_multi_in`\n",
    "\n",
    "\n",
    "\n",
    "为了确保我们真正了解这里发生了什么，\n",
    "我们可以用多个输入通道实现互相关运算。\n",
    "请注意，我们所做的只是执行一个互相关操作\n",
    "然后使用`sum()`函数将结果相加。\n",
    "\n",
    "让我们先导入这些库，然后再开始讨论多个输入和输出通道的概念。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load ../utils/djl-imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "1"
    }
   },
   "outputs": [],
   "source": [
    "NDManager manager = NDManager.newBaseManager();\n",
    "\n",
    "public NDArray corr2D(NDArray X, NDArray K) {\n",
    "\n",
    "    long h = K.getShape().get(0);\n",
    "    long w = K.getShape().get(1);\n",
    "\n",
    "    NDArray Y = manager.zeros(new Shape(X.getShape().get(0) - h + 1, X.getShape().get(1) - w + 1));\n",
    "\n",
    "    for (int i = 0; i < Y.getShape().get(0); i++) {\n",
    "        for (int j = 0; j < Y.getShape().get(1); j++) {\n",
    "            NDArray temp = X.get(i + \":\" + (i + h) + \",\" + j + \":\" + (j + w)).mul(K);\n",
    "            Y.set(new NDIndex(i + \",\" + j), temp.sum());\n",
    "        }\n",
    "    }\n",
    "    return Y;\n",
    "}\n",
    "\n",
    "public NDArray corr2dMultiIn(NDArray X, NDArray K) {\n",
    "\n",
    "    long h = K.getShape().get(0);\n",
    "    long w = K.getShape().get(1);\n",
    "    \n",
    "    // 首先，沿着'X'的第0维（通道维）进行遍历\n",
    "    // “K”。然后，把它们加在一起\n",
    "\n",
    "    NDArray res = manager.zeros(new Shape(X.getShape().get(0) - h + 1, X.getShape().get(1) - w + 1));\n",
    "    for (int i = 0; i < X.getShape().get(0); i++) {\n",
    "        for (int j = 0; j < K.getShape().get(0); j++) {\n",
    "            if (i == j) {\n",
    "                res = res.add(corr2D(X.get(new NDIndex(i)), K.get(new NDIndex(j))));\n",
    "            }\n",
    "        }\n",
    "    }\n",
    "    return res;\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们可以构造输入数组`X`和内核数组`K`\n",
    "对应于上图中的值\n",
    "验证互相关操作的输出。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "2"
    }
   },
   "outputs": [],
   "source": [
    "NDArray X = manager.create(new Shape(2, 3, 3), DataType.INT32);\n",
    "X.set(new NDIndex(0), manager.arange(9));\n",
    "X.set(new NDIndex(1), manager.arange(1, 10));\n",
    "X = X.toType(DataType.FLOAT32, true);\n",
    "\n",
    "NDArray K = manager.create(new Shape(2, 2, 2), DataType.INT32);\n",
    "K.set(new NDIndex(0), manager.arange(4));\n",
    "K.set(new NDIndex(1), manager.arange(1, 5));\n",
    "K = K.toType(DataType.FLOAT32, true);\n",
    "\n",
    "corr2dMultiIn(X, K);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 多输出通道\n",
    "\n",
    "无论输入通道的数量如何，\n",
    "到目前为止，我们总是以一个输出通道结束。\n",
    "但是，正如我们之前讨论的，\n",
    "事实证明，在每一层有多个通道是必要的。\n",
    "在最流行的神经网络架构中，\n",
    "我们实际上增加了通道维度\n",
    "当我们在神经网络中往上走的时候，\n",
    "通常通过降低采样来权衡空间分辨率\n",
    "更大的*通道深度*。\n",
    "凭直觉，你可以想到每个频道\n",
    "作为对不同特征的回应。\n",
    "现实比对这种直觉的最天真的解释要复杂一些，因为表征并不是独立学习的，而是经过优化以共同有用的。\n",
    "因此，可能不是单个通道学习边缘检测器，而是通道空间中的某个方向对应于检测边缘。\n",
    "\n",
    "\n",
    "用$c_i$和$c_o$表示数字\n",
    "输入和输出通道，\n",
    "让$k_h$和$k_w$作为内核的高度和宽度。\n",
    "要获得具有多个通道的输出，\n",
    "我们可以创建一个内核数组\n",
    "形状为$c_i\\times k_h\\times k_w$\n",
    "对于每个输出通道。\n",
    "我们在输出通道维度上连接它们，\n",
    "所以卷积核的形状\n",
    "是$c_o\\times c_i\\times k_h\\times k_w$。\n",
    "在互相关运算中，\n",
    "计算每个输出通道上的结果\n",
    "来自对应于该输出通道的卷积内核\n",
    "并从输入阵列中的所有通道获取输入。\n",
    "\n",
    "我们实现了一个互相关函数\n",
    "计算多个通道的输出，如下所示。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "3"
    }
   },
   "outputs": [],
   "source": [
    "public NDArray corrMultiInOut(NDArray X, NDArray K) {\n",
    "\n",
    "    long cin = K.getShape().get(0);\n",
    "    long h = K.getShape().get(2);\n",
    "    long w = K.getShape().get(3);\n",
    "    \n",
    "    // 沿“K”的第0维遍历，每次执行\n",
    "    // 输入'X'的互相关运算。所有结果都是正确的\n",
    "    // 使用stack函数合并在一起\n",
    "\n",
    "    NDArray res = manager.create(new Shape(cin, X.getShape().get(1) - h + 1, X.getShape().get(2) - w + 1));\n",
    "\n",
    "    for (int j = 0; j < K.getShape().get(0); j++) {\n",
    "        res.set(new NDIndex(j), corr2dMultiIn(X, K.get(new NDIndex(j))));\n",
    "    }\n",
    "        \n",
    "    return res;\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们构造了一个具有3个输出通道的卷积核\n",
    "通过将内核数组`K`与`K+1`连接起来\n",
    "（`K`中每个元素加一个）和`K+2`。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "4"
    }
   },
   "outputs": [],
   "source": [
    "K = NDArrays.stack(new NDList(K, K.add(1), K.add(2)));\n",
    "K.getShape()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下面，我们执行互相关操作\n",
    "在输入数组`X`和内核数组`K`上。\n",
    "现在输出包含3个通道。\n",
    "第一个通道的结果是一致的\n",
    "使用上一个输入数组的结果`X`\n",
    "以及多输入通道，\n",
    "单输出通道内核。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "5"
    }
   },
   "outputs": [],
   "source": [
    "corrMultiInOut(X, K);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## $1\\times 1$ 卷积层\n",
    "\n",
    "首先，一个$1 \\times 1$的卷积，即 $k_h = k_w = 1$，\n",
    "这似乎没有多大意义。\n",
    "毕竟，卷积将相邻像素关联起来。\n",
    "一个$1 \\times 1$的卷积显然不是。\n",
    "尽管如此，它们还是很受欢迎的业务，有时也包括在内\n",
    "在复杂深层网络的设计中。\n",
    "让我们详细了解一下它的实际功能。\n",
    "因为使用了最小窗口，\n",
    "$1\\times 1$ 卷积将失去该能力\n",
    "更大的卷积层\n",
    "识别由相互作用组成的模式\n",
    "在高度和宽度尺寸中的相邻元素之间。\n",
    "只会进行$1\\times 1$的卷积运算\n",
    "在通道维度上。\n",
    "\n",
    ":numref:`fig_conv_1x1` 显示了互相关计算\n",
    "使用$1\\times 1$卷积内核\n",
    "有3个输入通道和2个输出通道。\n",
    "请注意，输入和输出具有相同的高度和宽度。\n",
    "输出中的每个元素都是派生的\n",
    "来自同一位置的元素的*线性组合*\n",
    "在输入图像中。\n",
    "你可以想象$1\\times 1$的卷积层\n",
    "构成一个应用于每个像素位置的完全连接层\n",
    "将$c_i$对应的输入值转换为$c_o$输出值。\n",
    "因为这仍然是一个卷积层，\n",
    "权重在像素位置上绑定。\n",
    "因此$1\\times 1$卷积层需要$c_o\\times c_i$权重\n",
    "(加上偏差项)。\n",
    "\n",
    "\n",
    "![The cross-correlation computation uses the $1\\times 1$ convolution kernel with 3 input channels and 2 output channels. The inputs and outputs have the same height and width. ](https://raw.githubusercontent.com/d2l-ai/d2l-en/master/img/conv-1x1.svg)\n",
    "\n",
    ":label:`fig_conv_1x1`\n",
    "\n",
    "\n",
    "让我们看看这在实践中是否有效：\n",
    "我们实现了$1 \\times 1$的卷积运算\n",
    "使用完全连接的层。\n",
    "唯一的问题是我们需要做一些调整\n",
    "矩阵乘法前后的数据形状。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "6"
    }
   },
   "outputs": [],
   "source": [
    "public NDArray corr2dMultiInOut1x1(NDArray X, NDArray K) {\n",
    "\n",
    "    long channelIn = X.getShape().get(0);\n",
    "    long height = X.getShape().get(1);\n",
    "    long width = X.getShape().get(2);\n",
    "\n",
    "    long channelOut = K.getShape().get(0);\n",
    "    X = X.reshape(channelIn, height * width);\n",
    "    K = K.reshape(channelOut, channelIn);\n",
    "    NDArray Y = K.dot(X); // 全连通层中的矩阵乘法\n",
    "\n",
    "    return Y.reshape(channelOut, height, width);\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "执行$1\\times 1$卷积时，\n",
    "上述函数相当于之前实现的互相关函数 `corrMultiInOut()`.\n",
    "让我们用一些参考数据来验证这一点。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "7"
    }
   },
   "outputs": [],
   "source": [
    "X = manager.randomUniform(0f, 1.0f, new Shape(3, 3, 3));\n",
    "K = manager.randomUniform(0f, 1.0f, new Shape(2, 3, 1, 1));\n",
    "\n",
    "NDArray Y1 = corr2dMultiInOut1x1(X, K);\n",
    "NDArray Y2 = corrMultiInOut(X, K);\n",
    "\n",
    "System.out.println(Math.abs(Y1.sum().getFloat() - Y2.sum().getFloat()) < 1e-6);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 总结\n",
    "\n",
    "* 可以使用多个通道来扩展卷积层的模型参数。\n",
    "* 当以每像素为基础应用时，$1\\times 1$卷积层相当于完全连接层。\n",
    "* $1\\times 1$卷积层通常用于调整网络层之间的通道数并控制模型复杂性。\n",
    "\n",
    "\n",
    "## 练习\n",
    "\n",
    "1. 假设我们有两个卷积核，大小分别为$k_1$和$k_2$（中间没有非线性）。\n",
    "    * 证明运算结果可以用一次卷积来表示。\n",
    "    * 等效单卷积的维数是多少？\n",
    "    * 反过来是真的吗？\n",
    "1. 假设输入形状为$c_i\\times h\\times w$，卷积核形状为$c_o\\times c_i\\times k_h\\times k_w$，填充为 $(p_h, p_w)$，步长为$(s_h, s_w)$。\n",
    "    * 正向计算的计算成本（乘法和加法）是多少？\n",
    "    * 内存占用是多少？\n",
    "    * 反向计算的内存占用是多少？\n",
    "    * 反向计算的计算成本是多少？\n",
    "1. 如果我们将输入通道数$c_i$和输出通道数$c_o$翻一番，计算的数量会增加多少？如果我们把填充物翻一番会怎么样？\n",
    "1. 如果卷积核的高度和宽度是$k_h=k_w=1$，那么正向计算的复杂度是多少？\n",
    "1. 本节最后一个示例中的变量`Y1`和`Y2`是否完全相同？为什么？\n",
    "1. 当卷积窗口不是$1\\times 1$时，如何使用矩阵乘法实现卷积？\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Java",
   "language": "java",
   "name": "java"
  },
  "language_info": {
   "codemirror_mode": "java",
   "file_extension": ".jshell",
   "mimetype": "text/x-java-source",
   "name": "Java",
   "pygments_lexer": "java",
   "version": "14.0.2+12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
